knitr::opts_chunk$set(echo = TRUE)
Add album timelines and any information that could be relevant to her music
To fully capture Taylor’s evolution we wanted to consider both quantitative (audio features) and qualitative (natural language processing) aspects of her work. We hypothesized that we would see a progression in both her technical sound and the content of her songs as she pivoted from being a more acoustic, country artist to more of a pop artist.
To consider the technical sound aspects we used 11 quantitative audio features provided by Spotify: acousticness, danceability, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, and valence. For more information on these features click here
To consider Taylor’s music evolution we focused our attention on audio features we suspected would have changed the most from album to album: danceability, valence, energy, and length. Below are plots showing the change in the features over different albums.
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## ID = col_character(),
## Name = col_character(),
## Length = col_double(),
## danceability = col_double(),
## energy = col_double(),
## key = col_double(),
## loudness = col_double(),
## mode = col_double(),
## speechiness = col_double(),
## acousticness = col_double(),
## instrumentalness = col_double(),
## liveness = col_double(),
## valence = col_double(),
## tempo = col_double(),
## Release = col_double(),
## Album = col_character()
## )
# Clustering Analysis
After developing a general sense of how Taylor’s audio features changed over time we wanted to investigate how similar her songs are through clustering analysis. Since our initial analysis showed very different mean values for danceability, energy, length, and valence versus release dates we hypothesized that song audio features would result in distinct clusters for each of the 9 albums considered.
Before clustering the data we first tried to determine the optimal number of clusters using the elbow graph below:
From the plot we can see that the plot begins to flatten out at k = 3. This is surprising as we had suspected that the data would cluster around the 9 albums. Additionally, we can see that even with 9 centers the clustering still had a relatively low explained variance of a little more than 0.5.
Using three centers and plotting energy vs. valence we can see the three distinct clusters. Cluster 1 is characterized by low valence and low energy; cluster 2 is high energy and high valence; and cluster 3 is lower valence but higher energy than cluster 1. From the color code we can see that clusters are not indicative of the albums and instead shows that these two qualities are distributed across multiple albums.
Similar to the previous plot, the clustering showed that energy and acousticness features are not album specific, but distributed across albums. The plot also helps show the non-linear nature of Taylor’s sound. For example, while her second album “Fearless” is highly acoustic her next album “Speak Now” is on the opposite of the graph. We can then see that Taylor went back to her earlier sound in “Red” which is also clustered high in acousticness with Fearless.
Next we decided to consider how each cluster varied from each other by creating bar charts of the grouped means values:
From the bar graphs, it seems that acousticness and Release are major factors which distinguish group 1, while valence, energy, and danceability are what distinguish group 2.
From the three center analysis, we can see that clustering using audio features does not seem to be great at distinguish different albums. To confirm this suspicion we will increase the number of centers to 9:
Using 9 centers and recreating the graphs from the three center analysis, we can see that it becomes even harder to distinguish the albums from each other. In each cluster we have multiple different albums with very different release dates.
Just like the three center clustering, the nine center clustering emphasizes how similar Fearless and Red were audio feature wise. Most importantly, this clustering shows that Taylor manages to vary both valence and energy across albums regardless of the release date.
Next the acousticness vs. energy plot was used using the 9 center clustering data. Once again, “Fearless” and “Red”are highly concentrated in cluster 6 which is characterized by lower energy and higher acousticness. However, we also see a variety of other albums like “1989”, “Taylor Swift”, and “Speak Now” in this cluster.
While not cluster specific, this plot also shows that her most recent albums (“Reputation”, “Lover”, “Evermore”, and “Folklore”) are lower in acousticness and higher in energy. Additionally, the album “1989” acts almost as a transition album between the two distinct zones.
Key takeaways:
Clustering using song audio features was not very insightful for distinguishing albums (also the explained variance was less than 0.6). This likely because Taylor has used the same producers throughout her career and therefore achieves a similar balance of features across each album.
From this we can see that sound of Taylor’s earlier albums (“Taylor Swift”, “Fearless”, “Speak Now”, “Red”) had the greatest fluctuation jumping between low energy and high acousticness and high energy and low acousticness.
Taylor’s album “1989” had the greatest variance across individual songs (in both clustering graphs) and acted almost as a transition album to her newer work which has concentrated in the higher energy and lower acousticness zone.
# Reading in data and setting it up for sentiment analysis
get_sentiments('afinn')
## # A tibble: 2,477 x 2
## word value
## <chr> <dbl>
## 1 abandon -2
## 2 abandoned -2
## 3 abandons -2
## 4 abducted -2
## 5 abduction -2
## 6 abductions -2
## 7 abhor -3
## 8 abhorred -3
## 9 abhorrent -3
## 10 abhors -3
## # … with 2,467 more rows
get_sentiments('nrc')
## # A tibble: 13,875 x 2
## word sentiment
## <chr> <chr>
## 1 abacus trust
## 2 abandon fear
## 3 abandon negative
## 4 abandon sadness
## 5 abandoned anger
## 6 abandoned fear
## 7 abandoned negative
## 8 abandoned sadness
## 9 abandonment anger
## 10 abandonment fear
## # … with 13,865 more rows
get_sentiments('bing')
## # A tibble: 6,786 x 2
## word sentiment
## <chr> <chr>
## 1 2-faces negative
## 2 abnormal negative
## 3 abolish negative
## 4 abominable negative
## 5 abominably negative
## 6 abominate negative
## 7 abomination negative
## 8 abort negative
## 9 aborted negative
## 10 aborts negative
## # … with 6,776 more rows
# Taylor Swift Album
ts <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/tswift"))
ts <- tibble(ts)
ts$ts <- as.character(ts$ts)
ts <- ts %>%
unnest_tokens(word, ts)%>%
anti_join(stop_words)%>%
count(word, sort=TRUE)
## Joining, by = "word"
# Fearless Album
fearless <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/fearless"))
fearless <- tibble(fearless)
fearless$fearless <- as.character(fearless$fearless)
fearless <- fearless %>%
unnest_tokens(word, fearless)%>%
anti_join(stop_words)%>%
count(word, sort=TRUE)
## Joining, by = "word"
# Speak Now Album
speak <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/speak_now"))
speak <- tibble(speak)
speak$speak <- as.character(speak$speak)
speak <- speak %>%
unnest_tokens(word, speak)%>%
anti_join(stop_words)%>%
count(word, sort=TRUE)
## Joining, by = "word"
# Red Album
red <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/red"))
red <- tibble(red)
red$red <- as.character(red$red)
red <- red %>%
unnest_tokens(word, red)%>%
anti_join(stop_words)%>%
count(word, sort=TRUE)
## Joining, by = "word"
# 1989 Album
nineteen89 <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/1989"))
nineteen89 <- tibble(nineteen89)
nineteen89$nineteen89 <- as.character(nineteen89$nineteen89)
nineteen89<- nineteen89 %>%
unnest_tokens(word, nineteen89)%>%
anti_join(stop_words)%>%
count(word, sort=TRUE)
## Joining, by = "word"
# Reputation Album
rep <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/rep"))
rep <- tibble(rep)
rep$rep <- as.character(rep$rep)
rep <- rep %>%
unnest_tokens(word, rep)%>%
anti_join(stop_words)%>%
count(word, sort=TRUE)
## Joining, by = "word"
# Lover Album
lover <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/lover"))
lover <- tibble(lover)
lover$lover <- as.character(lover$lover)
lover <- lover %>%
unnest_tokens(word, lover)%>%
anti_join(stop_words)%>%
count(word, sort=TRUE)
## Joining, by = "word"
# Folklore Album
folklore <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/folklore"))
folklore <- tibble(folklore)
folklore$folklore <- as.character(folklore$folklore)
folklore <- folklore %>%
unnest_tokens(word, folklore)%>%
anti_join(stop_words)%>%
count(word, sort=TRUE)
## Joining, by = "word"
# Evermore Album
evermore <- read_lines(url("https://raw.githubusercontent.com/jesslaudie/DS3001-Final-Project/main/album_lyrics/evermore"))
evermore <- tibble(evermore)
evermore$evermore <- as.character(evermore$evermore)
evermore <- evermore %>%
unnest_tokens(word, evermore)%>%
anti_join(stop_words)%>%
count(word, sort=TRUE)
## Joining, by = "word"
# TS Album
ts_affin <- ts %>%
inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = ts_affin,
aes(x=value)
)+
geom_histogram(color="seagreen", fill="powderblue")+
ggtitle("Taylor Swift Album Sentiment Range")+
theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Fearless Album
fearless_affin <- fearless %>%
inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = fearless_affin,
aes(x=value)
)+
geom_histogram(color="burlywood4", fill="lightgoldenrod2")+
ggtitle("Fearless Album Sentiment Range")+
theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Speak Now Album
speak_affin <- speak %>%
inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = speak_affin,
aes(x=value)
)+
geom_histogram(color="darkmagenta", fill="deeppink3")+
ggtitle("Speak Now Album Sentiment Range")+
theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Red Album
red_affin <- red %>%
inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = red_affin,
aes(x=value)
)+
geom_histogram(color="red4", fill="indianred")+
ggtitle("Red Album Sentiment Range")+
theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# 1989 Album
nineteen89_affin <- nineteen89 %>%
inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = nineteen89_affin,
aes(x=value)
)+
geom_histogram(color="blueviolet", fill="thistle2")+
ggtitle("1989 Album Sentiment Range")+
theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Reputation Album
rep_affin <- rep %>%
inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = rep_affin,
aes(x=value)
)+
geom_histogram(color="gray19", fill="gray82")+
ggtitle("Reputation Album Sentiment Range")+
theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Lover Album
lover_affin <- lover %>%
inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = lover_affin,
aes(x=value)
)+
geom_histogram(color="lightskyblue", fill="pink")+
ggtitle("Lover Album Sentiment Range")+
theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Folklore Album
folklore_affin <- folklore %>%
inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = folklore_affin,
aes(x=value)
)+
geom_histogram(color="gray68", fill="gray93")+
ggtitle("Folklore Album Sentiment Range")+
theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Evermore Album
evermore_affin <- evermore %>%
inner_join(get_sentiments("afinn"))
## Joining, by = "word"
ggplot(data = evermore_affin,
aes(x=value)
)+
geom_histogram(color="coral3", fill="navajowhite3")+
ggtitle("Evermore Album Sentiment Range")+
theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Taylor Swift Album
set.seed(42)
ggplot(ts[1:50,], aes(label = word, size = n, color = n)
) +
geom_text_wordcloud() +
theme_minimal() + scale_color_gradient(low = "seagreen4", high = "turquoise3") + ggtitle("Taylor Swift Album")
# Fearless
set.seed(42)
ggplot(fearless[1:50,], aes(label = word, size = n, color = n)
) +
geom_text_wordcloud() +
theme_minimal() + scale_color_gradient(low = "goldenrod", high = "burlywood4")+ ggtitle("Fearless Album")
# Speak Now
set.seed(42)
ggplot(speak[1:50,], aes(label = word, size = n, color = n)
) +
geom_text_wordcloud() +
theme_minimal() + scale_color_gradient(low = "deeppink3", high = "darkmagenta") + ggtitle("Speak Now Album")
# Red
set.seed(42)
ggplot(red[1:50,], aes(label = word, size = n, color = n)
) +
geom_text_wordcloud() +
theme_minimal() + scale_color_gradient(low = "indianred", high = "red4")+ ggtitle("Red Album")
# 1989
set.seed(42)
ggplot(nineteen89[1:50,], aes(label = word, size = n, color = n)
) +
geom_text_wordcloud() +
theme_minimal() + scale_color_gradient(low = "mediumpurple1", high = "blueviolet")+ ggtitle ("1989 Album")
# Reputation
set.seed(42)
ggplot(rep[1:50,], aes(label = word, size = n, color = n)
) +
geom_text_wordcloud() +
theme_minimal() + scale_color_gradient(low = "gray66", high = "gray19")+ggtitle("Reputation Album")
# Lover
set.seed(42)
ggplot(lover[1:50,], aes(label = word, size = n, color = n)
) +
geom_text_wordcloud() +
theme_minimal() + scale_color_gradient(low = "palevioletred1", high = "lightskyblue")+ ggtitle("Lover Album")
# Folklore
set.seed(42)
ggplot(folklore[1:50,], aes(label = word, size = n, color = n)
) +
geom_text_wordcloud() +
theme_minimal() + scale_color_gradient(low = "gray68", high = "gray55")+ggtitle("Folklore Album")
# Evermore
set.seed(42)
ggplot(evermore[1:50,], aes(label = word, size = n, color = n)
) +
geom_text_wordcloud() +
theme_minimal() + scale_color_gradient(low = "navajowhite3", high = "lightsalmon2")+ ggtitle("Evermore Album")
# Bing Analysis
# TS Album
ts_bing <- ts %>%
inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(ts_bing$sentiment)
##
## negative positive
## 42 24
# neg 42 pos 24
# Fearless
fearless_bing <- fearless %>%
inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(fearless_bing$sentiment)
##
## negative positive
## 47 44
# neg 47 pos 44
# Speak Now
speak_bing <- speak %>%
inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(speak_bing$sentiment)
##
## negative positive
## 89 47
#neg 89 pos 47
# Red
red_bing <- red %>%
inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(red_bing$sentiment)
##
## negative positive
## 79 59
# neg 79 pos 59
# 1989
nineteen89_bing <- nineteen89 %>%
inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(nineteen89_bing$sentiment)
##
## negative positive
## 74 27
# neg 74 pos 27
# Reputation
rep_bing <- rep %>%
inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(rep_bing$sentiment)
##
## negative positive
## 112 53
#neg 112 pos 53
# Lover
lover_bing <- lover %>%
inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(lover_bing$sentiment)
##
## negative positive
## 99 55
#neg 99 pos 55
# Folklore
folklore_bing <- folklore %>%
inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(folklore_bing$sentiment)
##
## negative positive
## 103 35
# neg 103 pos 35
# Evermore
evermore_bing <- evermore %>%
inner_join(get_sentiments("bing"))
## Joining, by = "word"
table(evermore_bing$sentiment)
##
## negative positive
## 87 55
# neg 87 pos 55
# Creating a dataframe with the negative and positive values for each album and release dates
negative <- c(42, 47, 89, 79, 74, 112, 99, 103, 87)
positive <- c(24, 44, 47, 59, 27, 53, 55, 35, 55)
album <- c("Taylor Swift", "Fearless", "Speak Now", "Red", "1989", "Reputation", "Lover", "Folklore", "Evermore")
release_date <- c(2006, 2008, 2010, 2012, 2014, 2017, 2019, 2020, 2020)
sentiment <- data.frame(album, release_date, negative, positive, stringsAsFactors=TRUE)
View(sentiment)
# Normalizing the values for pos and neg
normalize <- function(x){
(x - min(x)) / (max(x) - min(x))
}
sentiment$negative <- normalize(sentiment$negative)
sentiment$positive <- normalize(sentiment$positive)
View(sentiment)
# Creating graph with just positive and negative values
plot <- ggplot(sentiment, aes(x=positive, y=negative, color = `album`)) + geom_text(label=album) + ggtitle("Negative vs. Positive Sentiment of Albums") + theme_light()
plot
# Graphing values in 3D plot using 3 variables (neg, pos, and release date)
library(plotly)
fig <- plot_ly(sentiment,
type = "scatter3d",
mode="markers",
x = ~`release_date`,
y = ~`positive`,
z = ~`negative`,
color = ~`album`,
text = ~paste('Album:',album))
fig
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors
Taylor Swift
ts_nrc <- ts %>%
inner_join(get_sentiments("nrc"))
## Joining, by = "word"
View(ts_nrc)
table(ts_nrc$sentiment)
##
## anger anticipation disgust fear joy negative
## 17 24 14 18 28 34
## positive sadness surprise trust
## 48 21 16 28
Fearless
fearless_nrc <- fearless %>%
inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(fearless_nrc$sentiment)
##
## anger anticipation disgust fear joy negative
## 22 25 10 25 38 57
## positive sadness surprise trust
## 67 27 19 46
Speak Now
speak_nrc <- speak %>%
inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(speak_nrc$sentiment)
##
## anger anticipation disgust fear joy negative
## 33 41 19 50 48 81
## positive sadness surprise trust
## 74 50 32 49
Red
red_nrc <- red %>%
inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(red_nrc$sentiment)
##
## anger anticipation disgust fear joy negative
## 32 36 23 39 40 68
## positive sadness surprise trust
## 84 36 20 44
1989
nineteen89_nrc <- nineteen89 %>%
inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(nineteen89_nrc$sentiment)
##
## anger anticipation disgust fear joy negative
## 27 20 17 36 22 58
## positive sadness surprise trust
## 39 33 12 21
Reputation
rep_nrc <- rep %>%
inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(rep_nrc$sentiment)
##
## anger anticipation disgust fear joy negative
## 49 32 32 62 45 99
## positive sadness surprise trust
## 79 47 26 43
Lover
lover_nrc <- lover %>%
inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(lover_nrc$sentiment)
##
## anger anticipation disgust fear joy negative
## 43 44 25 63 45 89
## positive sadness surprise trust
## 76 44 25 50
Folklore
folklore_nrc <- lover %>%
inner_join(get_sentiments("nrc"))
## Joining, by = "word"
table(folklore_nrc$sentiment)
##
## anger anticipation disgust fear joy negative
## 43 44 25 63 45 89
## positive sadness surprise trust
## 76 44 25 50
Evermore
evermore_nrc <- lover %>%
inner_join(get_sentiments("nrc"))
## Joining, by = "word"
View(evermore_nrc)
table(evermore_nrc$sentiment)
##
## anger anticipation disgust fear joy negative
## 43 44 25 63 45 89
## positive sadness surprise trust
## 76 44 25 50
#Future Work